source("explorationFunction.R")
library(ggplot2) # data viz
library(dplyr) # data manipulation
library(corrplot) # visualizing correlation matrices
library(ltm) # Item Reponse Theory -> biserial.cor
library(vcd) # Categorical Data Visualization -> assocstats
library(DescTools) # miscellaneous functions for descriptive statistics -> PercTable
library(ggridges) # ridge plots -> geom_density_ridges
library(plotly) # for plotly interactive visualization
library(tidyr) # for pivot_longer
COVID-19 has affected the whole world, causing big lockdowns and major disruptions in many areas. People had to stay home and avoid social contact to slow the spread of the virus, which changed how we live and interact. Students, especially in the Delhi-National Capital Region (NCR), have been particularly affected. Schools had to switch quickly to online learning, creating serious challenges for traditional education and highlighting the need to understand how this change impacts students’ lives.
This study looks at how the COVID-19 pandemic has affected students in the Delhi-NCR, focusing on their mental health, social interactions, and education. The shift to online learning has been tough for regular educational systems, making it important to understand its effects on students. This information is key to creating better educational strategies and mental health services. The research also aligns with Sustainable Development Goal 3 (SDG 3), which aims to ensure healthy lives and promote well-being for all ages.
The report is for researchers, educators, policymakers, and others interested in how COVID-19 has affected students’ psychology and education. The main goals are to evaluate how well online learning worked during the pandemic, examine the link between students’ daily activities and their health, identify ways students cope with stress and anxiety, and offer suggestions for improving education and mental health services.
The study uses data on students’ daily activities, health issues, and online class experiences, applying data analysis methods and statistical tools. There are limitations, such as potential biases in self-reported data and the focus on a specific region. The study assumes that the data accurately represents the experiences of students in the Delhi-NCR.
Despite these limitations, this study is important for understanding the wider impacts of the COVID-19 pandemic on students’ lives. It provides valuable insights that could shape future education strategies and mental health services. The report offers a detailed look at current issues and potential solutions, making it a useful resource for anyone concerned about students’ well-being during and after the pandemic.
The data for this research was collected from an online survey given to students between July 13 and July 17, 2020, using Google online platforms. The aim was to gather information on how the COVID-19 pandemic affected students’ daily lives, health, online class experiences, and other areas. This dataset is referenced in the research article “COVID-19 and its Impact on Education, Social Life, and Mental Health of Students: A Survey” by Chaturvedi, Vishwakarma, and Singh (2020), and is available to the public on Kaggle.
The dataset contains 19 variables and 1,182 responses, each reflecting different aspects of students’ experiences during the pandemic. Below is a brief overview of each variable:
# import data from directory in data frame format
df.raw <- read.csv("C:\\BINUS\\Semester 2\\DMV\\COVID-19 Survey Student Responses.csv")
head(df.raw, n=10)
## ID Region.of.residence Age.of.Subject Time.spent.on.Online.Class
## 1 R1 Delhi-NCR 21 2
## 2 R2 Delhi-NCR 21 0
## 3 R3 Delhi-NCR 20 7
## 4 R4 Delhi-NCR 20 3
## 5 R5 Delhi-NCR 21 3
## 6 R6 Delhi-NCR 21 0
## 7 R7 Delhi-NCR 19 2
## 8 R8 Outside Delhi-NCR 19 2
## 9 R9 Delhi-NCR 21 3
## 10 R10 Outside Delhi-NCR 20 0
## Rating.of.Online.Class.experience Medium.for.online.class
## 1 Good Laptop/Desktop
## 2 Excellent Smartphone
## 3 Very poor Laptop/Desktop
## 4 Very poor Smartphone
## 5 Good Laptop/Desktop
## 6 Very poor Smartphone
## 7 Very poor Smartphone
## 8 Very poor Tablet
## 9 Very poor Laptop/Desktop
## 10 Very poor Laptop/Desktop
## Time.spent.on.self.study Time.spent.on.fitness Time.spent.on.sleep
## 1 4 0.0 7
## 2 0 2.0 10
## 3 3 0.0 6
## 4 2 1.0 6
## 5 3 1.0 8
## 6 6 0.0 5
## 7 2 1.0 5
## 8 1 1.0 10
## 9 4 1.0 8
## 10 1 0.5 8
## Time.spent.on.social.media Prefered.social.media.platform Time.spent.on.TV
## 1 3 Linkedin 1
## 2 3 Youtube 0
## 3 2 Linkedin 0
## 4 5 Instagram 0
## 5 3 Instagram 1
## 6 1 Youtube 0
## 7 4 Instagram 0
## 8 5 Instagram 0
## 9 2 Whatsapp 1
## 10 5 Instagram 3
## Number.of.meals.per.day Change.in.your.weight Health.issue.during.lockdown
## 1 4 Increased NO
## 2 3 Decreased NO
## 3 3 Remain Constant NO
## 4 3 Decreased NO
## 5 4 Remain Constant NO
## 6 1 Decreased YES
## 7 3 Increased NO
## 8 3 Increased YES
## 9 3 Increased NO
## 10 3 Decreased YES
## Stress.busters Time.utilized
## 1 Cooking YES
## 2 Scrolling through social media YES
## 3 Listening to music NO
## 4 Watching web series NO
## 5 Social Media NO
## 6 Coding and studying for exams NO
## 7 Watching web series NO
## 8 Scrolling through social media NO
## 9 Online surfing NO
## 10 live stream watching NO
## Do.you.find.yourself.more.connected.with.your.family..close.friends...relatives...
## 1 YES
## 2 NO
## 3 YES
## 4 NO
## 5 NO
## 6 YES
## 7 YES
## 8 YES
## 9 NO
## 10 NO
## What.you.miss.the.most
## 1 School/college
## 2 Roaming around freely
## 3 Travelling
## 4 Friends , relatives
## 5 Travelling
## 6 School/college
## 7 Friends , relatives
## 8 Eating outside
## 9 Friends , relatives
## 10 School/college
# showing the structure of our dataframe
str(df.raw)
## 'data.frame': 1182 obs. of 19 variables:
## $ ID : chr "R1" "R2" "R3" "R4" ...
## $ Region.of.residence : chr "Delhi-NCR" "Delhi-NCR" "Delhi-NCR" "Delhi-NCR" ...
## $ Age.of.Subject : int 21 21 20 20 21 21 19 19 21 20 ...
## $ Time.spent.on.Online.Class : num 2 0 7 3 3 0 2 2 3 0 ...
## $ Rating.of.Online.Class.experience : chr "Good" "Excellent" "Very poor" "Very poor" ...
## $ Medium.for.online.class : chr "Laptop/Desktop" "Smartphone" "Laptop/Desktop" "Smartphone" ...
## $ Time.spent.on.self.study : num 4 0 3 2 3 6 2 1 4 1 ...
## $ Time.spent.on.fitness : num 0 2 0 1 1 0 1 1 1 0.5 ...
## $ Time.spent.on.sleep : num 7 10 6 6 8 5 5 10 8 8 ...
## $ Time.spent.on.social.media : num 3 3 2 5 3 1 4 5 2 5 ...
## $ Prefered.social.media.platform : chr "Linkedin" "Youtube" "Linkedin" "Instagram" ...
## $ Time.spent.on.TV : chr "1" "0" "0" "0" ...
## $ Number.of.meals.per.day : int 4 3 3 3 4 1 3 3 3 3 ...
## $ Change.in.your.weight : chr "Increased" "Decreased" "Remain Constant" "Decreased" ...
## $ Health.issue.during.lockdown : chr "NO" "NO" "NO" "NO" ...
## $ Stress.busters : chr "Cooking" "Scrolling through social media" "Listening to music" "Watching web series" ...
## $ Time.utilized : chr "YES" "YES" "NO" "NO" ...
## $ Do.you.find.yourself.more.connected.with.your.family..close.friends...relatives...: chr "YES" "NO" "YES" "NO" ...
## $ What.you.miss.the.most : chr "School/college" "Roaming around freely" "Travelling" "Friends , relatives" ...
Preprocessing was carried out in several steps to ensure the data was ready for analysis and consistent. The steps included:
# using dplyr library to rename the columns name's
# the left side is the new column name and the right side is the old column name
df.raw = df.raw %>% rename(
Region = Region.of.residence,
Age = Age.of.Subject,
OnlineClass_Time = Time.spent.on.Online.Class,
OnlineClass_Rating = Rating.of.Online.Class.experience,
OnlineClass_Medium = Medium.for.online.class,
SelfStudy_Duration = Time.spent.on.self.study,
Fitness_Duration = Time.spent.on.fitness,
Sleeping_Duration = Time.spent.on.sleep,
SosMed_Duration = Time.spent.on.social.media,
SosMed_Medium = Prefered.social.media.platform,
WatchingTV_Duration = Time.spent.on.TV,
NumberOfMeals = Number.of.meals.per.day,
WeightChange = Change.in.your.weight,
Having_HealthIssue = Health.issue.during.lockdown,
Stress_Busters = Stress.busters,
Time_Utilized = Time.utilized,
Connected_with_Family_Friends_Relatives = Do.you.find.yourself.more.connected.with.your.family..close.friends...relatives...,
Most_Missed_Things = What.you.miss.the.most
)
# create a new data frame from subsetting our old data frame and remove the ID column because we think that ID column isn't useful
df <- subset(df.raw, select = -c(ID))
# checking our new data frame structure's
str(df)
## 'data.frame': 1182 obs. of 18 variables:
## $ Region : chr "Delhi-NCR" "Delhi-NCR" "Delhi-NCR" "Delhi-NCR" ...
## $ Age : int 21 21 20 20 21 21 19 19 21 20 ...
## $ OnlineClass_Time : num 2 0 7 3 3 0 2 2 3 0 ...
## $ OnlineClass_Rating : chr "Good" "Excellent" "Very poor" "Very poor" ...
## $ OnlineClass_Medium : chr "Laptop/Desktop" "Smartphone" "Laptop/Desktop" "Smartphone" ...
## $ SelfStudy_Duration : num 4 0 3 2 3 6 2 1 4 1 ...
## $ Fitness_Duration : num 0 2 0 1 1 0 1 1 1 0.5 ...
## $ Sleeping_Duration : num 7 10 6 6 8 5 5 10 8 8 ...
## $ SosMed_Duration : num 3 3 2 5 3 1 4 5 2 5 ...
## $ SosMed_Medium : chr "Linkedin" "Youtube" "Linkedin" "Instagram" ...
## $ WatchingTV_Duration : chr "1" "0" "0" "0" ...
## $ NumberOfMeals : int 4 3 3 3 4 1 3 3 3 3 ...
## $ WeightChange : chr "Increased" "Decreased" "Remain Constant" "Decreased" ...
## $ Having_HealthIssue : chr "NO" "NO" "NO" "NO" ...
## $ Stress_Busters : chr "Cooking" "Scrolling through social media" "Listening to music" "Watching web series" ...
## $ Time_Utilized : chr "YES" "YES" "NO" "NO" ...
## $ Connected_with_Family_Friends_Relatives: chr "YES" "NO" "YES" "NO" ...
## $ Most_Missed_Things : chr "School/college" "Roaming around freely" "Travelling" "Friends , relatives" ...
# function to print the unique value and the total of unique value count in specific column (the column name will be passed as parameter)
CekUnique <- function(i)
{
# cat is a function which stands for concatenate and print
cat(i,": [",length(unique(df[[i]])),"]\n") # the total of unique value count
cat(unique(df[[i]]), sep = "; ") # unique value
cat("\n\n")
}
# check unique for some specific columns in our data frame
CekUnique("SosMed_Medium")
## SosMed_Medium : [ 16 ]
## Linkedin; Youtube; Instagram; Whatsapp; None; Reddit; Snapchat; Omegle; Twitter; Telegram; Facebook; Elyment; None ; Quora; Talklife; WhatsApp
CekUnique("WatchingTV_Duration")
## WatchingTV_Duration : [ 25 ]
## 1; 0; 3; 0.5; n; 2; 4.5; 1.5; N; 4; 0.3; 5; No tv; 0.1; 0.25; 6; 0.6; 7; 8; 15; ; 0.75; 2.5; 3.5; 0.4
CekUnique("Most_Missed_Things")
## Most_Missed_Things : [ 51 ]
## School/college; Roaming around freely; Travelling; Friends , relatives; Eating outside; Colleagues; Job; Nothing this is my usual life; Gym; All of them; Friends and School; Friends,Romaing and traveling; Only friends; Taking kids to park; All of the above ; All the above; Internet; Going to the movies; Nothing; school, relatives and friends; All ; All; all of the above; Nothing ; Eating outside and friends.; All above; Nah, this is my usual lifestyle anyway, just being lazy....; Family ; Football; Normal life; The idea of being around fun loving people but this time has certainly made us all to reconnect (and fill the gap if any) with our families and relatives so it is fun but certainly we do miss hanging out with friends; My normal routine; ALL; NOTHING; Being social ; Previous mistakes; all; nothing; Playing; .; I have missed nothing ; Travelling & Friends; To stay alone. ; Family; Badminton in court; Friends, relatives & travelling; everything; Friends and roaming around freely; Metro; School and friends.; School and my school friends
This is useful for data preprocessing so that we can identify inconsistencies or missing information.
The value “none” was replaced with “NA” to handle missing data consistently.
df$SosMed_Medium[df$SosMed_Medium %in% c("None", "None ")] <- NA
df$SosMed_Medium[df$SosMed_Medium == "Whatsapp"] <- "WhatsApp"
df$WatchingTV_Duration[df$WatchingTV_Duration %in% c("N", "n", " ", "")] <- NA
df$WatchingTV_Duration[df$WatchingTV_Duration=="No tv"] <- 0
df$Most_Missed_Things[df$Most_Missed_Things %in% c("All ", "all", "All of the above ", "All the above", "ALL", "everything", "all of the above", "All above", "All of them")] <- "All"
df$Most_Missed_Things[df$Most_Missed_Things %in% c("NOTHING", "I have missed nothing ", "nothing", "Nothing this is my usual life", "Nothing ", ".")] <- "Nothing"
# Change the datatype from char to numeric
df$WatchingTV_Duration = as.numeric(df$WatchingTV_Duration)
df$Region <- factor(df$Region, levels = unique(df$Region))
CekUnique("OnlineClass_Rating")
## OnlineClass_Rating : [ 6 ]
## Good; Excellent; Very poor; Average; NA; Poor
df$OnlineClass_Rating <- factor(df$OnlineClass_Rating, levels = c("Very poor","Poor","Average","Good","Excellent"))
df$OnlineClass_Medium <- factor(df$OnlineClass_Medium, levels = unique(df$OnlineClass_Medium))
df$SosMed_Medium <- factor(df$SosMed_Medium, levels = unique(df$SosMed_Medium))
CekUnique("WeightChange")
## WeightChange : [ 3 ]
## Increased; Decreased; Remain Constant
df$WeightChange <- factor (df$WeightChange, levels = c("Decreased","Remain Constant","Increased"))
df$Having_HealthIssue <- factor(df$Having_HealthIssue)
df$Stress_Busters <- factor(df$Stress_Busters, levels = unique(df$Stress_Busters))
df$Time_Utilized <- factor(df$Time_Utilized)
df$Connected_with_Family_Friends_Relatives <- factor(df$Connected_with_Family_Friends_Relatives)
df$Most_Missed_Things <- factor(df$Most_Missed_Things, levels = unique(df$Most_Missed_Things))
# choose the name of columns which datatype is numeric/ integer or categorical/ char
num_cols = c()
cat_cols = c()
for (i in names(df))
{
if (is.numeric(df[[i]]) || is.integer(df[[i]]))
{
num_cols <- c(num_cols,i)
}
else
{
cat_cols <- c(cat_cols,i)
}
}
cat("Numeric: ",paste(num_cols,collapse=", "),"\n\n")
## Numeric: Age, OnlineClass_Time, SelfStudy_Duration, Fitness_Duration, Sleeping_Duration, SosMed_Duration, WatchingTV_Duration, NumberOfMeals
cat("Categorical: ",paste(cat_cols,collapse=", "),"\n")
## Categorical: Region, OnlineClass_Rating, OnlineClass_Medium, SosMed_Medium, WeightChange, Having_HealthIssue, Stress_Busters, Time_Utilized, Connected_with_Family_Friends_Relatives, Most_Missed_Things
# Count how many NA value in each column
for (i in names(df))
{
cat(i,": ",sum(is.na(df[[i]])),"\n")
}
## Region : 0
## Age : 0
## OnlineClass_Time : 0
## OnlineClass_Rating : 24
## OnlineClass_Medium : 51
## SelfStudy_Duration : 0
## Fitness_Duration : 0
## Sleeping_Duration : 0
## SosMed_Duration : 0
## SosMed_Medium : 18
## WatchingTV_Duration : 10
## NumberOfMeals : 0
## WeightChange : 0
## Having_HealthIssue : 0
## Stress_Busters : 0
## Time_Utilized : 0
## Connected_with_Family_Friends_Relatives : 0
## Most_Missed_Things : 0
# Fill all the NA value in non numeric variable with mode and using mean to fill all the NA value in numeric variable
for (i in cat_cols)
{
# names is used to extract the name and not the frequency value
mode <- names(sort(table(df[[i]]), decreasing=TRUE)[1])
df[[i]][is.na(df[[i]])] <- mode
}
for (i in num_cols)
{
mean <- mean(df[[i]], na.rm = TRUE)
df[[i]][is.na(df[[i]])] <- mean
}
Verify the structure of our dataframe after completing the data preprocessing steps.
str(df)
## 'data.frame': 1182 obs. of 18 variables:
## $ Region : Factor w/ 2 levels "Delhi-NCR","Outside Delhi-NCR": 1 1 1 1 1 1 1 2 1 2 ...
## $ Age : num 21 21 20 20 21 21 19 19 21 20 ...
## $ OnlineClass_Time : num 2 0 7 3 3 0 2 2 3 0 ...
## $ OnlineClass_Rating : Factor w/ 5 levels "Very poor","Poor",..: 4 5 1 1 4 1 1 1 1 1 ...
## $ OnlineClass_Medium : Factor w/ 5 levels "Laptop/Desktop",..: 1 2 1 2 1 2 2 3 1 1 ...
## $ SelfStudy_Duration : num 4 0 3 2 3 6 2 1 4 1 ...
## $ Fitness_Duration : num 0 2 0 1 1 0 1 1 1 0.5 ...
## $ Sleeping_Duration : num 7 10 6 6 8 5 5 10 8 8 ...
## $ SosMed_Duration : num 3 3 2 5 3 1 4 5 2 5 ...
## $ SosMed_Medium : Factor w/ 13 levels "Linkedin","Youtube",..: 1 2 1 3 3 2 3 3 4 3 ...
## $ WatchingTV_Duration : num 1 0 0 0 1 0 0 0 1 3 ...
## $ NumberOfMeals : num 4 3 3 3 4 1 3 3 3 3 ...
## $ WeightChange : Factor w/ 3 levels "Decreased","Remain Constant",..: 3 1 2 1 2 1 3 3 3 1 ...
## $ Having_HealthIssue : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 2 1 2 1 2 ...
## $ Stress_Busters : Factor w/ 86 levels "Cooking","Scrolling through social media",..: 1 2 3 4 5 6 4 2 7 8 ...
## $ Time_Utilized : Factor w/ 2 levels "NO","YES": 2 2 1 1 1 1 1 1 1 1 ...
## $ Connected_with_Family_Friends_Relatives: Factor w/ 2 levels "NO","YES": 2 1 2 1 1 2 2 2 1 1 ...
## $ Most_Missed_Things : Factor w/ 36 levels "School/college",..: 1 2 3 4 3 1 4 5 4 1 ...
summary(df)
## Region Age OnlineClass_Time OnlineClass_Rating
## Delhi-NCR :721 Min. : 7.00 Min. : 0.000 Very poor:437
## Outside Delhi-NCR:461 1st Qu.:17.00 1st Qu.: 2.000 Poor : 30
## Median :20.00 Median : 3.000 Average :387
## Mean :20.17 Mean : 3.209 Good :230
## 3rd Qu.:21.00 3rd Qu.: 5.000 Excellent: 98
## Max. :59.00 Max. :10.000
##
## OnlineClass_Medium SelfStudy_Duration Fitness_Duration
## Laptop/Desktop :596 Min. : 0.000 Min. :0.0000
## Smartphone :539 1st Qu.: 2.000 1st Qu.:0.0000
## Tablet : 37 Median : 2.000 Median :1.0000
## Any Gadget : 5 Mean : 2.912 Mean :0.7658
## Smartphone or Laptop/Desktop: 5 3rd Qu.: 4.000 3rd Qu.:1.0000
## Max. :18.000 Max. :5.0000
##
## Sleeping_Duration SosMed_Duration SosMed_Medium WatchingTV_Duration
## Min. : 4.000 Min. : 0.000 Instagram:370 Min. : 0.000
## 1st Qu.: 7.000 1st Qu.: 1.000 WhatsApp :337 1st Qu.: 0.000
## Median : 8.000 Median : 2.000 Youtube :314 Median : 1.000
## Mean : 7.871 Mean : 2.366 Linkedin : 61 Mean : 1.023
## 3rd Qu.: 9.000 3rd Qu.: 3.000 Facebook : 52 3rd Qu.: 2.000
## Max. :15.000 Max. :10.000 Twitter : 28 Max. :15.000
## (Other) : 20
## NumberOfMeals WeightChange Having_HealthIssue
## Min. :1.000 Decreased :209 NO :1021
## 1st Qu.:2.000 Remain Constant:535 YES: 161
## Median :3.000 Increased :438
## Mean :2.918
## 3rd Qu.:3.000
## Max. :8.000
##
## Stress_Busters Time_Utilized
## Listening to music :276 NO :608
## Online gaming :175 YES:574
## Watching web series :102
## Reading books : 77
## Scrolling through social media: 74
## Sleeping : 71
## (Other) :407
## Connected_with_Family_Friends_Relatives Most_Missed_Things
## NO :351 School/college :379
## YES:831 Friends , relatives :223
## Travelling :183
## Roaming around freely:149
## Eating outside :104
## Colleagues : 67
## (Other) : 77
The summary shows how often each category was chosen for variables like Region, OnlineClass_Rating, OnlineClass_Medium, and others.
The summary gives us information about variables such as Age, OnlineClass_Time, SelfStudy_Duration, and more.
These summaries provide insights into how participants spent their time and managed daily activities during the pandemic.
Note: (Other) combine all remaining levels of a factor variable with more than six levels into a single “other” category Numeric variables provide mean value and Tukey’s five-number summary which includes the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum
BasicSummary(df)
## variable type levels topLevel
## 1 Region factor 2 Delhi-NCR
## 2 Age numeric 42 20
## 3 OnlineClass_Time numeric 21 4
## 4 OnlineClass_Rating factor 5 Very poor
## 5 OnlineClass_Medium factor 5 Laptop/Desktop
## 6 SelfStudy_Duration numeric 23 2
## 7 Fitness_Duration numeric 11 1
## 8 Sleeping_Duration numeric 18 8
## 9 SosMed_Duration numeric 22 1
## 10 SosMed_Medium factor 13 Instagram
## 11 WatchingTV_Duration numeric 22 0
## 12 NumberOfMeals numeric 8 3
## 13 WeightChange factor 3 Remain Constant
## 14 Having_HealthIssue factor 2 NO
## 15 Stress_Busters factor 86 Listening to music
## 16 Time_Utilized factor 2 NO
## 17 Connected_with_Family_Friends_Relatives factor 2 YES
## 18 Most_Missed_Things factor 36 School/college
## topCount topFrac missFreq missFrac
## 1 721 0.610 0 0
## 2 211 0.179 0 0
## 3 222 0.188 0 0
## 4 437 0.370 0 0
## 5 596 0.504 0 0
## 6 346 0.293 0 0
## 7 550 0.465 0 0
## 8 390 0.330 0 0
## 9 343 0.290 0 0
## 10 370 0.313 0 0
## 11 444 0.376 0 0
## 12 610 0.516 0 0
## 13 535 0.453 0 0
## 14 1021 0.864 0 0
## 15 276 0.234 0 0
## 16 608 0.514 0 0
## 17 831 0.703 0 0
## 18 379 0.321 0 0
num_df <- subset(df,select = num_cols) # make new data frame with consisting only numeric/ integer data type
convert_matrix(num_df)
## Age OnlineClass_Time SelfStudy_Duration Fitness_Duration
## Mean 20.17 3.21 2.91 0.77
## Median 20.00 3.00 2.00 1.00
## Variance 30.43 4.42 4.58 0.52
## Standard Deviation 5.52 2.10 2.14 0.72
## Range 52.00 10.00 18.00 5.00
## Interquartile Range 4.00 3.00 2.00 1.00
## Sleeping_Duration SosMed_Duration WatchingTV_Duration
## Mean 7.87 2.37 1.02
## Median 8.00 2.00 1.00
## Variance 2.61 3.12 1.60
## Standard Deviation 1.62 1.77 1.26
## Range 11.00 10.00 15.00
## Interquartile Range 2.00 2.00 2.00
## NumberOfMeals
## Mean 2.92
## Median 3.00
## Variance 0.69
## Standard Deviation 0.83
## Range 7.00
## Interquartile Range 1.00
These statistical measures offer insights into how data is distributed within a dataset:
# Age of Respondent visualization (group to some categories with intervals of 10)
age_plot <- ggplot(df, aes(x = Age, fill=Having_HealthIssue)) + # x axis represents age, and the color will be differentiate by having health issue
geom_histogram(color = "black", binwidth = 10) + # our histogram will have black outline and the bars are group per 10 intervaols
# add label to our graph
labs(title = "Distribution of Respondents by Age",
x = "Age",
y = "Number of Respondents",
fill = "Having Health Issue?") +
scale_fill_brewer(palette = "Set2") + # filling colors using color palettes from the RColorBrewer package
coord_flip() # swapping the x and y axis, it is useful when the data categorical name is too long and it doesn't fit in the x axis
ggplotly(age_plot)
The survey responses indicate that most people who answered are teenagers or in their twenties. This tells us important things about the different issues people of different ages might have. Younger students might find it hard to stay focused and disciplined when they’re learning online. Older students might worry about what will happen with their careers in the future.
# Respondent's Region of Residence visualization
residence_plot <- pie(table(df$Region), # table is used to create a contingency table of the counts of each category
main = "Distribution of Respondents by Region of Residence",
col = c("purple", "pink"))
The analysis shows that a large majority, more than half of the respondents, live in the Delhi-National Capital Region (Delhi-NCR).
time_plot <- pie(table(df$Time_Utilized),
main = "Distribution of Respondents by Time Management",
col = c("purple", "pink"))
The study reveals that time management has been a significant challenge for students during the pandemic. According to the research, 51.44% of students felt they were not effectively using their time. This finding illustrates the struggle students face in adapting to new routines and managing their time well in an online learning setup.
health_plot <- pie(table(df$Having_HealthIssue),
main = "Distribution of Respondents by Having Health Issue",
col = c("purple", "pink"))
Based on the survey, about 13.62% of students mentioned having health issues, showing a notable impact on their physical and mental well-being. This finding mirrors a broader worldwide pattern of increased health concerns due to extended lockdowns and lifestyle adjustments.
# Respondent's Preferred Social Media Platform visualization
sosmed_plot <- df %>% # %>% is a pipe operator to chain multiple operation
group_by(SosMed_Medium) %>% # group data by SosMed_Medium
summarise(freqCount = n()) %>% # returns one row for each combination of grouping variables; n() -> counts the number of frequency count in each group
ggplot(aes(x=reorder(SosMed_Medium, freqCount), y=freqCount, fill=freqCount)) + # reorder the sosmed_medium factor based on freqCount in ascending order, eventhough it looks like descending because coord_flip is used here
geom_bar(color="black", stat="identity") + # stat = "identity" -> height of the bars is determined by freqCount
labs(title = "Distribution of Respondents by Preferred Social Media Platform",
x = "Social Media Platforms",
y = "Number of Respondents",
fill = "Frequency Count") +
scale_fill_gradient(low = "lightblue", high = "darkblue") + # filling colors using gradient with starting color light blue and end with red
coord_flip()
ggplotly(sosmed_plot)
The report also examines which social media platforms students preferred most, offering insights into their online habits and how social media helped them stay connected during the pandemic. Instagram and WhatsApp emerged as the top choices, highlighting their significant role in students’ social interactions.
# Top 20 Respondent's Preferred Stress Busters
stress_plot <- df %>%
group_by(Stress_Busters) %>%
summarise(freqCount = n()) %>%
top_n(10, freqCount) %>% # filter the top 10 stress busters
ggplot(aes(x=reorder(Stress_Busters, freqCount), y=freqCount, fill=freqCount)) +
geom_bar(color="black", stat="identity") +
labs(title = "Distribution of Respondents by Preferred Stress Busters (Top 10)",
x = "Activity",
y = "Number of Respondents",
fill = "Frequency Count") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
coord_flip()
ggplotly(stress_plot)
Managing stress is another important concern. Throughout the pandemic, students found solace in activities like listening to music and playing online video games, which ranked among their top five ways to cope with stress. These activities played a crucial role in helping students navigate these challenging times by providing relaxation and stress relief.
# Top 10 Things the Respondent Miss the Most
miss_plot <- df %>%
group_by(Most_Missed_Things) %>%
summarise(freqCount = n()) %>%
top_n(10, freqCount) %>%
ggplot(aes(x=reorder(Most_Missed_Things, freqCount), y=freqCount, fill=freqCount)) +
geom_bar(color="black", stat="identity") +
labs(title = "Distribution of Respondents by Things they Miss the Most (Top 10)",
x = "Object/ Subject",
y = "Number of Respondents",
fill = "Frequency Count") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
coord_flip()
ggplotly(miss_plot)
Students expressed longing for aspects of their lives before the pandemic, such as attending school and college and spending time with friends and family. These factors highlight the routines and connections students valued most and illustrate how the pandemic disrupted their social and academic environments. The lack of face-to-face interactions and structured schedules had a significant impact on their daily lives.
# store the column name of variables related to duration
duration_cols <- c('OnlineClass_Time', 'SelfStudy_Duration', 'Fitness_Duration',
'Sleeping_Duration', 'SosMed_Duration', 'WatchingTV_Duration')
# calculate mean duration for each activity
mean_durations <- sapply(duration_cols, function(col) mean(df[[col]], na.rm = TRUE)) # a function which take the parameter col to calculate each mean
names(mean_durations) <- duration_cols # labeling the vector
# create a long-format data frame
df_long <- stack(df[, duration_cols]) # stack the duration_cols -> resulting a data frame with 2 columns (the value and the names of stacked columns)
names(df_long) <- c("Duration", "Activity")
# reorder activities by mean of duration
ordered_activities <- names(sort(mean_durations))
df_long$Activity <- factor(df_long$Activity, levels = ordered_activities)
# write.table(df_long, file="df_long.csv", col.names = TRUE, row.names = FALSE, sep=",")
# create the box plot
duration_plot <- ggplot(df_long, aes(y = Activity, x = Duration)) +
geom_boxplot(fill="gray") +
labs(title = "Distribution of Time Spent by Activity",
x = "Duration (Hours(s))",
y = "Type of Activity")
print(duration_plot)
Looking into how students used their time uncovered some concerning patterns. Exercise was the least prioritized activity, suggesting a lack of physical activity. This lack can lead to lasting negative effects on both physical and mental well-being. It’s crucial to encourage children to incorporate more physical activity into their daily schedules.
# remove outliers in duration column based on the outlier we found in the boxplot
for (i in duration_cols)
{
outliers <- boxplot.stats(df[[i]])$out # identify outliers
df <- df[!df[[i]] %in% outliers, ] # remove the whole row if it contains outliers
}
str(df)
## 'data.frame': 1048 obs. of 18 variables:
## $ Region : Factor w/ 2 levels "Delhi-NCR","Outside Delhi-NCR": 1 1 1 1 1 1 1 2 1 2 ...
## $ Age : num 21 21 20 20 21 21 19 19 21 20 ...
## $ OnlineClass_Time : num 2 0 7 3 3 0 2 2 3 0 ...
## $ OnlineClass_Rating : Factor w/ 5 levels "Very poor","Poor",..: 4 5 1 1 4 1 1 1 1 1 ...
## $ OnlineClass_Medium : Factor w/ 5 levels "Laptop/Desktop",..: 1 2 1 2 1 2 2 3 1 1 ...
## $ SelfStudy_Duration : num 4 0 3 2 3 6 2 1 4 1 ...
## $ Fitness_Duration : num 0 2 0 1 1 0 1 1 1 0.5 ...
## $ Sleeping_Duration : num 7 10 6 6 8 5 5 10 8 8 ...
## $ SosMed_Duration : num 3 3 2 5 3 1 4 5 2 5 ...
## $ SosMed_Medium : Factor w/ 13 levels "Linkedin","Youtube",..: 1 2 1 3 3 2 3 3 4 3 ...
## $ WatchingTV_Duration : num 1 0 0 0 1 0 0 0 1 3 ...
## $ NumberOfMeals : num 4 3 3 3 4 1 3 3 3 3 ...
## $ WeightChange : Factor w/ 3 levels "Decreased","Remain Constant",..: 3 1 2 1 2 1 3 3 3 1 ...
## $ Having_HealthIssue : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 2 1 2 1 2 ...
## $ Stress_Busters : Factor w/ 86 levels "Cooking","Scrolling through social media",..: 1 2 3 4 5 6 4 2 7 8 ...
## $ Time_Utilized : Factor w/ 2 levels "NO","YES": 2 2 1 1 1 1 1 1 1 1 ...
## $ Connected_with_Family_Friends_Relatives: Factor w/ 2 levels "NO","YES": 2 1 2 1 1 2 2 2 1 1 ...
## $ Most_Missed_Things : Factor w/ 36 levels "School/college",..: 1 2 3 4 3 1 4 5 4 1 ...
Now our data is reduced to 1048 observations
# reshape the data into a long format
df_long <- df %>%
pivot_longer(
cols = c(OnlineClass_Time, SelfStudy_Duration, Fitness_Duration, Sleeping_Duration, SosMed_Duration, WatchingTV_Duration),
names_to = "Activity",
values_to = "Duration"
)
p_combined <- ggplot(df_long, aes(x = Having_HealthIssue, y = Duration, fill = Having_HealthIssue)) +
geom_boxplot(color="black") +
coord_flip() +
labs(title="Time spent on Various Activities by Health issue during lockdown",
x="Health issue during lockdown",
y="Time spent") +
facet_wrap(~ Activity) +
scale_fill_manual(values = c("YES" = "pink", "NO" = "purple")) +
theme(legend.position = "none")
print(p_combined)
The study did not establish a straightforward connection between activity levels and health issues, suggesting that multiple factors influence students’ health, such as mental well-being, diet, and pre-existing medical conditions. To comprehensively grasp these complex dynamics, a thorough approach is necessary.
meal_health_plot <- ggplot(df, aes(x = Having_HealthIssue, y = NumberOfMeals)) +
geom_violin(color="black", fill="gray") +
coord_flip() +
labs(title="Number of Meals a Day by Health issue during lockdown",
x="Health issue during lockdown",
y="Number of Meals a Day") +
scale_fill_brewer(palette="Set1") + # set color palette to fill the graph
theme(legend.position = "none")
print(meal_health_plot)
weight_change_plot <- ggplot(df, aes(x=NumberOfMeals, y=WeightChange, fill=WeightChange)) +
geom_density_ridges() +
labs(title = "Change in Weight vs. Number of Meals per Day",
x = "Number of Meals per Day",
y = "Change in Weight",
fill = "Change in Weight") +
theme(axis.text.y = element_blank()) # hide the y-axis because it isn't used in this graph
print(weight_change_plot)
Next, we’ll look for anomalies
FindOutliers(df$NumberOfMeals)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 3 0.5099625 5.306831 1 5
## 2 Hampel 1048 0 505 3.0000000 3.000000 3 3
## 3 BoxplotRule 1048 0 45 1.5000000 4.500000 2 4
The first column lists the name of the outlier detection method used. The second column shows the total number of observations, and the third column indicates how many observations are missing (recorded as NA). The fourth column displays the number of outliers detected by each method. The next two columns present the lower and upper limits used to identify outliers. The final two columns show the lower and upper boundaries of the data values that are not considered outliers.
For example, the three-sigma rule identified only 3 outliers, while the Hampel method found 505 outliers, and the boxplot rule identified 45 outliers.
FindOutliers(df$OnlineClass_Time)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 0 -2.879921 9.336696 0.0 9
## 2 Hampel 1048 0 31 -1.447800 7.447800 0.0 7
## 3 BoxplotRule 1048 0 107 0.500000 9.500000 0.5 9
It’s evident that the three-sigma rule detected no outliers, whereas the Hampel method identified 31 outliers, and the boxplot rule flagged 107 outliers.
FindOutliers(df$SelfStudy_Duration)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 0 -2.079668 7.337301 0 7
## 2 Hampel 1048 0 13 -2.447800 6.447800 0 6
## 3 BoxplotRule 1048 0 72 1.000000 7.000000 1 7
It’s clear that according to the analysis: * The three-sigma rule found 0 outliers * The Hampel method identified 13 outliers * The boxplot rule detected 72 outliers
FindOutliers(df$Fitness_Duration)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 0 -1.179393 2.620614 0 2.5
## 2 Hampel 1048 0 0 -1.223900 3.223900 0 2.5
## 3 BoxplotRule 1048 0 0 -0.500000 2.500000 0 2.5
It’s apparent that all the rules identified 0 outliers.
FindOutliers(df$Sleeping_Duration)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 0 3.233145 12.46304 4 12
## 2 Hampel 1048 0 0 3.552200 12.44780 4 12
## 3 BoxplotRule 1048 0 41 6.000000 12.00000 6 12
We can observe that both the three-sigma and Hampel rules identified 0 outliers, while the boxplot rule flagged 41 outliers.
FindOutliers(df$SosMed_Duration)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 0 -1.913046 6.248638 0 6
## 2 Hampel 1048 0 0 -2.447800 6.447800 0 6
## 3 BoxplotRule 1048 0 0 0.000000 6.000000 0 6
It’s apparent that all the rules identified 0 outliers.
FindOutliers(df$WatchingTV_Duration)$summary
## method n nMiss nOut lowLim upLim minNom maxNom
## 1 ThreeSigma 1048 0 11 -2.204534 4.13603 0 4
## 2 Hampel 1048 0 0 -3.447800 5.44780 0 5
## 3 BoxplotRule 1048 0 0 -1.000000 5.00000 0 5
According to the three-sigma rule, there are 11 outliers detected, while neither the Hampel nor the boxplot rules found any outliers.
Now, we’ll proceed to remove the outliers identified by the three-sigma rule.
# function to remove outliers using three-sigma rule: data outside the range of mean +-3 × standard deviation are considered outliers
remove_outliers_3sigma <- function(df, col_name) {
mean_val <- mean(df[[col_name]], na.rm = TRUE) # calculate mean with NA value removed
sd_val <- sd(df[[col_name]], na.rm = TRUE) # calculate std with NA value removed
lower_limit <- mean_val - 3 * sd_val
upper_limit <- mean_val + 3 * sd_val
# keep the rows where the values in the specified column (col_name) are within the lower and upper limit
df_filtered <- df[df[[col_name]] >= lower_limit & df[[col_name]] <= upper_limit, ] #buat ngasi syarat hrs ada koma [,]
return(df_filtered)
}
# remove outliers using our function
df <- remove_outliers_3sigma(df, "NumberOfMeals")
df <- remove_outliers_3sigma(df, "NumberOfMeals")
df <- remove_outliers_3sigma(df, "OnlineClass_Time")
df <- remove_outliers_3sigma(df, "SelfStudy_Duration")
df <- remove_outliers_3sigma(df, "Fitness_Duration")
df <- remove_outliers_3sigma(df, "Sleeping_Duration")
df <- remove_outliers_3sigma(df, "SosMed_Duration")
df <- remove_outliers_3sigma(df, "WatchingTV_Duration")
Before conducting the Pearson correlation, we performed the Shapiro-Wilk test to assess the normality of the numerical variables. None of the variables passed the normality test (all p-values < 0.05), indicating that the data does not follow a normal distribution. Therefore, Pearson correlation, which requires normality, cannot be applied in this case.
col_names = c()
sph_test = c()
for (i in num_cols)
{
col_names <- c(col_names, i)
ans <- shapiro.test(df[[i]])$p.value
sph_test <- c(sph_test, ans)
}
sph_df = data.frame(Name = col_names, PVal = sph_test)
sph_df[order(sph_df$PVal),]
## Name PVal
## 4 Fitness_Duration 6.107004e-34
## 1 Age 3.445997e-32
## 7 WatchingTV_Duration 1.407972e-31
## 8 NumberOfMeals 9.956070e-30
## 6 SosMed_Duration 3.087665e-25
## 5 Sleeping_Duration 6.143239e-21
## 3 SelfStudy_Duration 1.334916e-20
## 2 OnlineClass_Time 3.777746e-16
None of the variables meet the criteria for normal distribution, as all p-values are less than 0.05. Therefore, we can conclude that our numerical data does not follow a normal distribution, which means we cannot use Pearson correlation.
num_df <- subset(df,select = num_cols) #update the num_df after removing some of the observations
# convert YES/NO factor columns to numeric (1 for YES, 0 for NO)
num_df$Having_HealthIssue <- as.numeric(df$Having_HealthIssue == "YES")
num_df$Time_Utilized <- as.numeric(df$Time_Utilized == "YES")
num_df$Connected_with_Family_Friends_Relatives <- as.numeric(df$Connected_with_Family_Friends_Relatives == "YES")
# convert WeightChange to numeric based on mappings
num_df$WeightChange <- ifelse(df$WeightChange == "Increased", 2,
ifelse(df$WeightChange == "Decreased", 0, 1))
c <- cor(num_df)
# View(c)
corrplot(c, type="upper", method="number",
number.cex=0.7, # font size of correlation numbers
tl.cex=0.7, # font size of column/row names
tl.srt=60)
# check correlation of having health issue category with quantitative data which is duration of different activities
for (i in duration_cols)
{
cat(i,": ",cor(num_df[[i]], num_df$Having_HealthIssue),"\n")
}
## OnlineClass_Time : -0.113825
## SelfStudy_Duration : -0.01142817
## Fitness_Duration : -0.04178206
## Sleeping_Duration : -0.0176028
## SosMed_Duration : 0.01514649
## WatchingTV_Duration : -0.03181608
There tends to be a mild correlation between the time spent on different activities and the likelihood of experiencing health issues. Longer durations spent on online classes, fitness activities, and watching TV show slight negative associations with health problems. Conversely, spending more time on social media shows a negligible positive correlation with health issues. These correlations, however, are generally weak, suggesting that individual activities may not strongly predict the likelihood of having health issues. It’s likely that other factors not measured in this study also influence these relationships.
# compute the total time for productive and non productive activity
num_df$TotalP = num_df$SelfStudy_Duration + num_df$Fitness_Duration
num_df$TotalN = num_df$SosMed_Duration + num_df$WatchingTV_Duration
# View(num_df)
# compute correlation
correlations <- cor(num_df[, c("TotalP", "TotalN", "Time_Utilized")])
print(correlations)
## TotalP TotalN Time_Utilized
## TotalP 1.0000000 -0.18156524 0.23649549
## TotalN -0.1815652 1.00000000 -0.09479337
## Time_Utilized 0.2364955 -0.09479337 1.00000000
ggplot(num_df, aes(x = TotalP,
y = Time_Utilized)) +
geom_point(color="green4") +
labs(title = "Correlation Between SelfStudy and Fitness Duration with Time_Utilized")+
geom_smooth(method="lm")
As the combined time spent on productive activities like self-study and fitness increases, there is a slight tendency for the overall time utilized to also increase.
correlations <- cor(num_df[, c("NumberOfMeals", "WeightChange")])
print(correlations)
## NumberOfMeals WeightChange
## NumberOfMeals 1.0000000 0.1141691
## WeightChange 0.1141691 1.0000000
ggplot(num_df, aes(x = NumberOfMeals,
y = WeightChange)) +
geom_point(color="green4") +
labs(title = "Weight Change by Number Of Meals")+
geom_smooth(method="lm")
This suggests a weak positive relationship: as the number of meals consumed increases, there is a slight tendency for weight change to also increase.
col_pair = c()
chisq_pv = c()
# harus dikasi ( ) klo ga nanti error
for (i in 2:(length(cat_cols)-1))
{
for (j in (i+1):length(cat_cols))
{
col_pair <- c(col_pair,paste(cat_cols[i],",", cat_cols[j]))
ans <- chisq.test(df[[cat_cols[i]]], df[[cat_cols[j]]])$p.value
chisq_pv <- c(chisq_pv, ans)
# print(chisq.test(df[[cat_cols[i]]], df[[cat_cols[j]]])$p.value)
}
}
chi_df = data.frame(Name = col_pair, PVal = chisq_pv)
head(chi_df[order(chi_df$PVal),], n=10)
## Name PVal
## 33 Stress_Busters , Most_Missed_Things 3.969857e-274
## 18 SosMed_Medium , Stress_Busters 6.805568e-233
## 8 OnlineClass_Rating , Most_Missed_Things 1.042444e-63
## 5 OnlineClass_Rating , Stress_Busters 1.740276e-33
## 21 SosMed_Medium , Most_Missed_Things 1.490637e-22
## 15 OnlineClass_Medium , Most_Missed_Things 5.783002e-21
## 6 OnlineClass_Rating , Time_Utilized 3.927739e-17
## 34 Time_Utilized , Connected_with_Family_Friends_Relatives 4.941566e-10
## 12 OnlineClass_Medium , Stress_Busters 2.114716e-07
## 7 OnlineClass_Rating , Connected_with_Family_Friends_Relatives 6.117431e-06
Some variables show a strong connection because we reject the null hypothesis (which states the variables are independent). In statistical terms, we reject the null hypothesis when the p-value is less than 0.05. This indicates that these two variables are not independent in the population from which this sample was taken.
tbl1 <- table(df$OnlineClass_Rating, df$Time_Utilized) # creates a contingency table: a frequency table that displays the distribution of one categorical variable in rows and another in columns
print(PercTable(tbl1))
##
## NO YES
##
## Very poor freq 239 134
## perc 23.1% 13.0%
##
## Poor freq 14 11
## perc 1.4% 1.1%
##
## Average freq 174 162
## perc 16.8% 15.7%
##
## Good freq 89 126
## perc 8.6% 12.2%
##
## Excellent freq 11 74
## perc 1.1% 7.2%
##
print(PercTable(tbl1, margin=1)) # sum of percentage per row
##
## NO YES Sum
##
## Very poor freq 239 134 373
## perc 23.1% 13.0% 36.1%
##
## Poor freq 14 11 25
## perc 1.4% 1.1% 2.4%
##
## Average freq 174 162 336
## perc 16.8% 15.7% 32.5%
##
## Good freq 89 126 215
## perc 8.6% 12.2% 20.8%
##
## Excellent freq 11 74 85
## perc 1.1% 7.2% 8.2%
##
print(PercTable(tbl1, margin=2)) # sum of percentage per column
##
## NO YES
##
## Very poor freq 239 134
## perc 23.1% 13.0%
##
## Poor freq 14 11
## perc 1.4% 1.1%
##
## Average freq 174 162
## perc 16.8% 15.7%
##
## Good freq 89 126
## perc 8.6% 12.2%
##
## Excellent freq 11 74
## perc 1.1% 7.2%
##
## Sum freq 527 507
## perc 51.0% 49.0%
##
Respondents who spent more time utilizing their time generally gave more positive ratings compared to those who spent less time.
# computes a variety of association statistics for the contingency table, such as Chi-square test statistic, Cramér's V, and contingency coefficients, to determine the strength and significance of the association between the two categorical variables
assocstats(tbl1)
## X^2 df P(> X^2)
## Likelihood Ratio 89.102 4 0
## Pearson 83.052 4 0
##
## Phi-Coefficient : NA
## Contingency Coeff.: 0.273
## Cramer's V : 0.283
tbl1_plt = ggplot(df, aes(x = OnlineClass_Rating, fill = Time_Utilized)) +
geom_bar(position = position_dodge(preserve = "single")) + # add a bar plot where bars of different categories are placed side by side, but bars with only one value will remain centered without dodging
labs(y = "Frequency (in tens)",
title = "Online Class Rating by Time Utilized",
fill = "Time Utilized?") +
scale_y_continuous(labels = function(x) x / 10)
print(tbl1_plt)
There are no strict rules for interpreting Cramer’s V values, but as a
guide: weak (0 to 0.2), moderate (0.2 to 0.3), strong (0.3 to 0.5), and
redundant (0.5 to 0.99, indicating the variables likely measure the same
concept).
In this case, there is a moderate association between the online class rating and time utilized.
tbl2 <- table(df$Connected_with_Family_Friends_Relatives, df$Time_Utilized)
print(PercTable(tbl2))
##
## NO YES
##
## NO freq 201 103
## perc 19.4% 10.0%
##
## YES freq 326 404
## perc 31.5% 39.1%
##
print(PercTable(tbl2, margin=1))
##
## NO YES Sum
##
## NO freq 201 103 304
## perc 19.4% 10.0% 29.4%
##
## YES freq 326 404 730
## perc 31.5% 39.1% 70.6%
##
print(PercTable(tbl2, margin=2))
##
## NO YES
##
## NO freq 201 103
## perc 19.4% 10.0%
##
## YES freq 326 404
## perc 31.5% 39.1%
##
## Sum freq 527 507
## perc 51.0% 49.0%
##
assocstats(tbl2)
## X^2 df P(> X^2)
## Likelihood Ratio 40.127 1 2.3801e-10
## Pearson 39.554 1 3.1906e-10
##
## Phi-Coefficient : 0.196
## Contingency Coeff.: 0.192
## Cramer's V : 0.196
tbl2_plt = ggplot(df, aes(x = Connected_with_Family_Friends_Relatives, fill = Time_Utilized)) +
geom_bar(position = position_dodge(preserve = "single")) +
labs(y = "Frequency (in hundreds)",
title = "Connected with Family Friends Relatives? by Time Utilized",
fill = "Time Utilized?") +
scale_y_continuous(labels = function(x) x / 100)
print(tbl2_plt)
The findings show a connection between effective time management and stronger family bonds. Students who managed their time well reported closer relationships with their families. This is important as it highlights how good time management can enhance personal connections during challenging circumstances.
A few notable patterns and trends emerged from the investigation, providing valuable insights into the broader impacts of the pandemic.
Time Management Challenges: During the pandemic, a significant number of respondents (51.44%) reported struggling with time management. This highlights the difficulty many face in effectively organizing their schedules for online learning.
Health Concerns: Approximately 13.62% of students mentioned experiencing health issues, indicating the pandemic’s considerable impact on their mental and physical well-being. This aligns with a global trend of prolonged lockdowns and lifestyle changes contributing to increased health worries.
Activity Preferences: Students ranked exercise as their least favored activity when dividing their time among other activities. This preference for exercise could have adverse long-term effects on their overall health.
Family Connections: Effective time management was positively associated with stronger family ties. This suggests that improving time management skills can help students maintain closer relationships with their families, providing crucial emotional support during difficult times.
Stress Management: Listening to music and playing online video games emerged as popular methods for stress relief. These activities served as essential coping strategies to manage the stress and uncertainties brought by the pandemic.
No Direct Link Between Activity and Health: The absence of a clear correlation between activity levels and health issues indicates that various factors like nutrition, mental health, and existing health conditions influence students’ well-being.
The study reveals no direct link between students’ health and the time they spend on activities such as self-study and online courses. It’s concerning that more than half of the participants did not effectively manage their time during lockdowns, potentially impacting their health negatively. Addressing this issue involves supporting students in learning online and improving their time management skills. Additionally, the study highlights a lack of physical exercise among participants, which is critical for their overall well-being during these challenging times. Encouraging physical activity should be a priority to enhance their overall health.
Kunal Chaturvedi, Dinesh Vishwakarma, Nidhi Singh. (2020). COVID-19 and its impact on education, social life and mental health of students: A Survey. Children and Youth Services Review, 121. https://doi.org/10.1016/j.childyouth.2020.105866
Kunal Chaturvedi. (2020). COVID-19 and its impact on students. Kaggle. Available at: https://www.kaggle.com/datasets/kunal28chaturvedi/covid19-and-its-impact-on-students/data